Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Token propagation E2E tests. #581

Merged
merged 6 commits into from
Jan 22, 2020

Conversation

rubenvp8510
Copy link
Collaborator

@rubenvp8510 rubenvp8510 commented Aug 6, 2019

Signed-off-by: Ruben Vargas [email protected]

Also, in my tests I realized that this is not gonna work if the query service has --es.use-aliase flag enabled.

@rubenvp8510
Copy link
Collaborator Author

In the end, I haven't needed to move anything on the image, it works if I add to the test openshift user the cluster-admin-role, this does the job because the way the openshift-elasticsearch-plugin works[1].

If the subjectAccessReview is satisfied, it assigns a role to the user, in this case in the origin-logging-elasticsearch5 image there is a jaeger[2] configuration for that. This allows some users to have access to the indices. (and that is why the cluster-admin can do requests :D ). This will allow us to test the whole thing without touch the image.. I think this works at least for testing...

  1. https://github.com/fabric8io/openshift-elasticsearch-plugin
  2. https://github.com/openshift/origin-aggregated-logging/blob/master/elasticsearch/sgconfig/sg_config.yml#L21

@rubenvp8510 rubenvp8510 force-pushed the token-propagation branch 2 times, most recently from 0bc6efa to ce44015 Compare October 10, 2019 16:20
@rubenvp8510 rubenvp8510 changed the title [WIP] Add query service token propagation support Add query service token propagation support Oct 14, 2019
@jpkrohling
Copy link
Contributor

Is this ready for review? If so, there are a couple of things that needs to be cleaned up first, like commented out code and the user change in the Makefile.

@rubenvp8510
Copy link
Collaborator Author

@jpkrohling Ahh! there is small details that I need to fix, (one of them is the things you pointed in your comment) ;)

Thanks

@rubenvp8510 rubenvp8510 force-pushed the token-propagation branch 12 times, most recently from 0d315fd to d498565 Compare October 15, 2019 01:00
@rubenvp8510
Copy link
Collaborator Author

@jpkrohling now is ready

@rubenvp8510 rubenvp8510 force-pushed the token-propagation branch 9 times, most recently from 768a43d to 273a566 Compare October 24, 2019 02:20
Copy link
Contributor

@jpkrohling jpkrohling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks good to me. There's only one change I'd like to see, about the random string generation. The e2e tests should be reviewed by @kevinearls, and once he gives his approval, mine will follow.

pkg/util/util.go Outdated
_, err := rand.Read(randString)
if err != nil {
// If we cannot generate random, return fixed.
return "ncNDoqLGrayxXzxTn5ANbOXZp3qXd0LA"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return an error instead, and let the caller decide. At the very least, there should be a WARN in the logs about this. It was really bad that I didn't realize that "SECRET" should have been a real random string...

args = append(args,
"--pass-access-token=true",
"--pass-user-bearer-token=true",
"--scope=user:info user:check-access user:list-projects",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spaces as separators? That's odd!

# install and run OCP
sudo docker cp $(docker create docker.io/openshift/origin:$OPENSHIFT_VERSION):/bin/oc /usr/local/bin/oc
oc cluster up --version=$OPENSHIFT_VERSION

oc cluster up --version=$OPENSHIFT_VERSION --public-hostname=${HOST_IP}.nip.io
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@kevinearls kevinearls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted the test does not run for me on OCP 4.2

defer resp.Body.Close()
return true, nil
})
require.NoError(t, err, "Token propagation tmake est failed")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo?


client, err := oAuthAuthorization(queryHost, "user-test-token", "any")

require.NoError(t, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test is failing for me here when I use a real OCP 4.2 OpenShift cluster. Have you been testing against minishift? I think you probably need to use the actual route here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't tested against OCP 4.2/minishift, I'll give it a try and fix what is failing. Thanks!

oc create user user-test-token
oc adm policy add-cluster-role-to-user cluster-admin user-test-token
# for ocp 4.2
htpasswd -c -B -b users.htpasswd user-test-token any
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a better idea to just ship the actual file instead of generating the file at the CI.

then
echo "Running token propagation tests"
oc create user user-test-token
oc adm policy add-cluster-role-to-user cluster-admin user-test-token
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bad thing about using "root-like" permissions in a test is that you might miss the cases where the user might not have the appropriate permissions. Not sure it's relevant for your test here, but in general, we should aim to have the minimum set of permissions possible.

args := []string{
"--cookie-secret=SECRET",
fmt.Sprintf("--cookie-secret=%s", secret),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This generates secrets containing chars like "+" and "/". How are they translated into the final YAML? Do they cause any troubles? I don't think it does, but it's good to have positive confirmation (and perhaps a test). Example of random strings I got from this code locally:

pZ8F0imysFsTwJ9kF8U92g==
ILmvfqWW/+bLEokY06NHig==
Zmc+jX2VWXkxsB98AIYwDg==
xb/fYVT6UHxYpNJXhjR7RQ==

@@ -38,6 +38,53 @@ type ElasticsearchDeployment struct {
Secrets []corev1.Secret
}

func (ed *ElasticsearchDeployment) injectArguments(container *corev1.Container) {
container.Args = append(container.Args,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably a matter of taste, but do you need to split this line here? The next line seems small enough to fit here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no I don't need to split the line not sure why I did it, or may be the make format did it for me? Anyway I will change it.

container.Args = append(container.Args, "--es.tls=true")
}

container.Args = append(container.Args,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't these options be added only if es.tls=true?


}

func (suite *TokenPropagationTestSuite) TearDownSuite() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need this func if it's empty? The commented out code should be removed anyway.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah this should not be empty, I commented those lines on my local but shouldn't be commented.

}

err := wait.Poll(retryInterval, timeout, func() (done bool, err error) {
req, err := http.NewRequest(http.MethodGet, suite.queryServiceEndPoint, nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check that error is nil, and return early in case of errors (return false, err). Otherwise, it will fail in the next line.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this wasn't done yet.


/* Try to reach query endpoint */
err = wait.Poll(retryInterval, timeout, func() (done bool, err error) {
req, err := http.NewRequest(http.MethodGet, suite.queryServiceEndPoint, nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as before: make the test more resilient, otherwise it might become annoying to all future PRs ;)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably want to check the err objects in this method and log their causes, without hard failing.

Jar: cookieJar,
}
/* Start oauth */
resp, err := client.Get("https://" + host + "/oauth/start")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kinda of the same as previous comments, but you probably just want to retry in case of some well-known networking problems, like timeouts.

var req *http.Request
/* Submit form */
if hasForm(responseBytes) {
req = getLoginFormRequest(responseBytes, resp.Request.URL, user, pass)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For OpenShift, do you really need to parse the form and simulate a real user there? Can't you just send the user:pass with basic auth? I think the Maistra components to send the requests via basic auth.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I can get a token an pass it to query service, but I don't know how to do it and test the whole thing using oauth proxy sidecar, Can we use basic authentication to and set the oauth proxy cookie? May be I'm missing something. Or we don't need to test using the sidecar?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maistra QE can confirm, but I understand that you can send basic auth and the proxy will do all the token negotiation based on it. Jaeger will see a token, just like if it was a browser session.

@rubenvp8510 rubenvp8510 force-pushed the token-propagation branch 2 times, most recently from 7100f85 to 512ee05 Compare January 14, 2020 18:56
@rubenvp8510 rubenvp8510 force-pushed the token-propagation branch 3 times, most recently from dc777fc to 25d8e83 Compare January 14, 2020 20:04
Signed-off-by: Ruben Vargas <[email protected]>

Signed-off-by: Ruben Vargas <[email protected]>
Copy link
Contributor

@jpkrohling jpkrohling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. A couple of test resiliency + negative tests requests, but this could be merged as is and get the remaining comments addressed in a follow-up PR.

func proxyInitArguments(jaeger *v1.Jaeger) []string {
secret, err := util.GenerateProxySecret()
if err != nil {
jaeger.Logger().Warnf("Error generating secret: %s, fallback to fixed secret", secret)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WithError(err), so that the cause is logged.

}

err := wait.Poll(retryInterval, timeout, func() (done bool, err error) {
req, err := http.NewRequest(http.MethodGet, suite.queryServiceEndPoint, nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this wasn't done yet.

APIVersion: "jaegertracing.io/v1",
},
}))
assert.NoError(t, framework.AddToFrameworkScheme(apis.AddToScheme, &esv1.ElasticsearchList{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps you want to use require here instead of assert?


/* Try to reach query endpoint */
err = wait.Poll(retryInterval, timeout, func() (done bool, err error) {
req, err := http.NewRequest(http.MethodGet, suite.queryServiceEndPoint, nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably want to check the err objects in this method and log their causes, without hard failing.


client, err := oAuthAuthorization(host, username, password)

err = wait.Poll(retryInterval, timeout, func() (done bool, err error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those tests seem to be missing.

@jpkrohling
Copy link
Contributor

@kevinearls could you give this a test in OpenShift before this gets merged?

@kevinearls
Copy link
Contributor

@jpkrohling Sure, I'll try it later today or first thing tomorrow

@kevinearls
Copy link
Contributor

@jpkrohling @rubenvp8510 I've tried this on an OCP 4.2 cluster, but unfortunately have not been able to get it to work so far. The TestTokenPropagationNoToken test passes, but TestTokenPropagationNoToken fails at line 134 with a 403.

I will note a couple of issues I came across in the code, but I don't have any ideas so far on how to get this to work.

@rubenvp8510
Copy link
Collaborator Author

rubenvp8510 commented Jan 16, 2020

@kevinearls I'll check, TestTokenPropagationNoToken should throws a 403 (Forbidden) , is the expected thing, I'll check other test to see why it does not pass.

require.Equal(t, http.StatusOK, resp.StatusCode)
if resp.StatusCode != http.StatusOK {
return false, errors.New("Query service returns http code: " + string(resp.StatusCode))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The if is redundant here as if the require.Equal fails the test will terminate. You should pick one or the other. In general I prefer the require as it makes for more concise code and produces better error messages. However, if you prefer the if you need to change the string(resp.StatusCode) to strconv.Itoa

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I preferred required too, but I had to do in this way, so if it fails the first time due the pod not fully ready or other timing issues, the wait.Poll will retry again.

If I use required, it will fail mediately. Making this test less reliable. IMHO.

I removed the required line.

}

func bindOperatorWithAuthDelegator() {
roleBinding := rbac.ClusterRoleBinding{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This clusterrolebinding and the token-test-user-bind clusterrolebinding created below need to be deleted at the end of the test (in TearDownSuite()) otherwise the test will fail after the first run as these will already exist on the cluster.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the observation, I've added cleaning routine for remove all created roles/role-bindings.

@kevinearls
Copy link
Contributor

@rubenvp8510 Sorry, my comment about the test failure was wrong. TestTokenPropagationValidToken is the test that fails, TestTokenPropagationNoToken works correctly.

Sorry about the careless cut and paste on my part. 😧

@rubenvp8510
Copy link
Collaborator Author

I ran the tests a couple of times, and sometimes fails and others not, I'll do this more reliable

Signed-off-by: Ruben Vargas <[email protected]>
Copy link
Contributor

@kevinearls kevinearls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Works for repeated runs on OCP 4.2

@jpkrohling jpkrohling merged commit c6fe438 into jaegertracing:master Jan 22, 2020
@jpkrohling
Copy link
Contributor

Thanks for being persistent, @rubenvp8510!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants